31.1 Ontology

385

Nevertheless, the sheer volume of data (sequences and structures) emerging from

experimental molecular biology is a powerful driver for treating it ontologically in

order to allow human beings, and machines, to make some sense of it. Without an

ontology the mass of data would be unstructured and, hence, overwhelming to the

human mind, for it would be very difficult to discern meaningful paths through it.

In bioinformatics, ontology typically has a more restricted definition, namely “a

working model of entities and interactions”. 4 These models would include a glossary

of terms as a basic part. Other components of a model are generally considered to be

the following (note that there has been little attempt by ontologists to define these

words carefully and unambiguously): classes or categories (sets of objects); attributes

or concepts, which may be either primitive (necessary conditions for membership of

a class) or defined (necessary and sufficient conditions for membership); arbitrary

rules (sometimes called axioms) constraining class membership, which might be

considered to be part of the glossary of terms; relations (between classes or con-

cepts), which might be either taxonomic (hierarchical) or associative; instantiations

(concrete examples; i.e., individual objects); and events that change attributes, or

relations, or both.

An ontology, which belongs to the category of semantics, is necessarily subordi-

nate to the rules, in the category of inference, for its construction much as a system

of classification (Sect. 31.2) depends on rules. The ontology is then superordinate

to mark-up, in a category of syntax. For example, a familiar mark-up technology is

XML (“extensible mark-up language”).

Mark-up is in turn superordinate to encoding in a form suitable for the computer. 5

Mark-up is essential for realizing the Semantic Web, an extension of the World

Wide Web that enables machines to “understand” the meaning of data on the web. 6

The Semantic Web comprises data stored in a standard format and linked with rela-

tionships that might allow machines to interpret the data, enabling them to identify

and extract relationships between different pieces of data and use these to draw new

4 Each different model—such as RiboWeb, EcoCyc—is typically called an “ontology”; hence, we

have the Gene Ontology, the Transparent Access to Multiple Bioinformatics Information Sources

(TAMBIS) Ontology (Baker et al. 1999), and so forth. If ontology is given the restricted meaning

of the study of classes of objects, then “an ontology” like TAMBIS can be considered to be the

product of ontological inquiry.

5 It is worth noting that many of these matters have long ago been tackled by chemists; databases such

as Beilstein and Chemical Abstracts have existed for more than a century, and encoding complex

molecular structures (albeit much simpler than a protein) as a string of characters has been achieved

using SMILES (simplified molecular input line system). See the Handbook of Chemoinformatics:

from Data to Knowledge (ed. J. Gasteiger) in four volumes (Wiley-VCH, Weinheim, 2003), for a

comprehensive overview.

6 Machines can understand data in the sense that they can interpret and analyse it, using algorithms

and statistical methods to uncover patterns and relationships. They can process large datasets and

identify correlations between different variables, and draw conclusions from the data; these conclu-

sions may seem surprising and revelatory because of the impossibility for a human being to hold

such large quantities of data in the mind.